Search CORE

238 research outputs found

A Preconditioned Hybrid SVD Method for Computing Accurately Singular Triplets of Large Matrices

Author: Stathopoulos Andreas
Wu Lingfei
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/05/2015
Field of study

The computation of a few singular triplets of large, sparse matrices is a challenging task, especially when the smallest magnitude singular values are needed in high accuracy. Most recent efforts try to address this problem through variations of the Lanczos bidiagonalization method, but they are still challenged even for medium matrix sizes due to the difficulty of the problem. We propose a novel SVD approach that can take advantage of preconditioning and of any well designed eigensolver to compute both largest and smallest singular triplets. Accuracy and efficiency is achieved through a hybrid, two-stage meta-method, PHSVDS. In the first stage, PHSVDS solves the normal equations up to the best achievable accuracy. If further accuracy is required, the method switches automatically to an eigenvalue problem with the augmented matrix. Thus it combines the advantages of the two stages, faster convergence and accuracy, respectively. For the augmented matrix, solving the interior eigenvalue is facilitated by a proper use of the good initial guesses from the first stage and an efficient implementation of the refined projection method. We also discuss how to precondition PHSVDS and to cope with some issues that arise. Numerical experiments illustrate the efficiency and robustness of the method.Comment: 24 pages, 20 figures, and 8 tables. Accepted to SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

How Web 1.0 Fails: The Mismatch Between Hyperlinks and Clickstreams

Author: Ackland Robert
Wu Lingfei
Publication venue
Publication date: 29/07/2013
Field of study

The core of the Web is a hyperlink navigation system collaboratively set up by webmasters to help users find desired websites. But does this system really work as expected? We show that the answer seems to be negative: there is a substantial mismatch between hyperlinks and the pathways that users actually take. A closer look at empirical surfing activities reveals the reason of the mismatch: webmasters try to build a global virtual world without geographical or cultural boundaries, but users in fact prefer to navigate within more fragmented, language-based groups of websites. We call this type of behavior "preferential navigation" and find that it is driven by "local" search engines.Comment: 12 pages, 4 figure

arXiv.org e-Print Archive

The Decentralized Structure of Collective Attention on the Web

Author: Wu Lingfei
Zhang Jiang
Publication venue
Publication date: 06/11/2012
Field of study

Background: The collective browsing behavior of users gives rise to a flow network transporting attention between websites. By analyzing the structure of this network we uncovered a nontrivial scaling regularity concerning the impact of websites. Methodology: We constructed three clickstreams networks, whose nodes were websites and edges were formed by the users switching between sites. We developed an indicator Ci as a measure of the impact of site i and investigated its correlation with the traffic of the site Ai both on the three networks and across the language communities within the networks. Conclusions: We found that the impact of websites increased slower than their traffic. Specifically, there existed a scaling relationship between Ci and Ai with an exponent gamma smaller than 1. We suggested that this scaling relationship characterized the decentralized structure of the clickstream circulation: the World Wide Web is a system that favors small sites in reassigning the collective attention of users.Comment: 12 pages, 7 figure

arXiv.org e-Print Archive

Tracing the Attention of Moving Citizens

Author: Wang Cheng-Jun
Wu Lingfei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2016
Field of study

With the widespread use of mobile computing devices in contemporary society, our trajectories in the physical space and virtual world are increasingly closely connected. Using the anonymous smartphone data of

1 \times 10^5

users in 30 days, we constructed the mobility network and the attention network to study the correlations between online and offline human behaviours. In the mobility network, nodes are physical locations and edges represent the movements between locations, and in the attention network, nodes are websites and edges represent the switch of users between websites. We apply the box-covering method to renormalise the networks. The investigated network properties include the size of box

l_B

and the number of boxes

N(l_B)

. We find two universal classes of behaviours: the mobility network is featured by a small-world property,

N(l_B) \simeq e^{-l_B}

, whereas the attention network is characterised by a self-similar property

N(l_B) \simeq l_B^{-\gamma}

. In particular, with the increasing of the length of box

l_B

, the degree correlation of the network changes from positive to negative which indicates that there are two layers of structure in the mobility network. We use the results of network renormalisation to detect the community and map the structure of the mobility network. Further, we located the most relevant websites visited in these communities, and identified three typical location-based behaviours, including the shopping, dating, and taxi-calling. Finally, we offered a revised geometric network model to explain our findings in the perspective of spatial-constrained attachment.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

Attention Dynamics in Collaborative Knowledge Creation

Author: Janssen Marco A.
Wu Lingfei
Publication venue
Publication date: 24/11/2015
Field of study

To uncover the mechanisms underlying the collaborative production of knowledge, we investigate a very large online Question and Answer system that includes the question asking and answering activities of millions of users over five years. We created knowledge networks in which nodes are questions and edges are the successive answering activities of users. We find that these networks have two common properties: 1) the mitigation of degree inequality among nodes; and 2) the assortative mixing of nodes. This means that, while the system tends to reduce attention investment on old questions in order to supply sufficient attention to new questions, it is not easy for novel knowledge be integrated into the existing body of knowledge. We propose a mixing model to combine preferential attachment and reversed preferential attachment processes to model the evolution of knowledge networks and successfully reproduce the ob- served patterns. Our mixing model is not only theoretically interesting but also provide insights into the management of online communities.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

Revisiting Spectral Graph Clustering with Generative Community Models

Author: Chen Pin-Yu
Wu Lingfei
Publication venue
Publication date: 05/10/2017
Field of study

The methodology of community detection can be divided into two principles: imposing a network model on a given graph, or optimizing a designed objective function. The former provides guarantees on theoretical detectability but falls short when the graph is inconsistent with the underlying model. The latter is model-free but fails to provide quality assurance for the detected communities. In this paper, we propose a novel unified framework to combine the advantages of these two principles. The presented method, SGC-GEN, not only considers the detection error caused by the corresponding model mismatch to a given graph, but also yields a theoretical guarantee on community detectability by analyzing Spectral Graph Clustering (SGC) under GENerative community models (GCMs). SGC-GEN incorporates the predictability on correct community detection with a measure of community fitness to GCMs. It resembles the formulation of supervised learning problems by enabling various community detection loss functions and model mismatch metrics. We further establish a theoretical condition for correct community detection using the normalized graph Laplacian matrix under a GCM, which provides a novel data-driven loss function for SGC-GEN. In addition, we present an effective algorithm to implement SGC-GEN, and show that the computational complexity of SGC-GEN is comparable to the baseline methods. Our experiments on 18 real-world datasets demonstrate that SGC-GEN possesses superior and robust performance compared to 6 baseline methods under 7 representative clustering metrics.Comment: Accepted by IEEE International Conference on Data Mining (ICDM) 2017 as a regular paper - full paper with supplementary materia

arXiv.org e-Print Archive

Deep Graph Translation

Author: Guo Xiaojie
Wu Lingfei
Zhao Liang
Publication venue
Publication date: 22/06/2018
Field of study

Inspired by the tremendous success of deep generative models on generating continuous data like image and audio, in the most recent year, few deep graph generative models have been proposed to generate discrete data such as graphs. They are typically unconditioned generative models which has no control on modes of the graphs being generated. Differently, in this paper, we are interested in a new problem named \emph{Deep Graph Translation}: given an input graph, we want to infer a target graph based on their underlying (both global and local) translation mapping. Graph translation could be highly desirable in many applications such as disaster management and rare event forecasting, where the rare and abnormal graph patterns (e.g., traffic congestions and terrorism events) will be inferred prior to their occurrence even without historical data on the abnormal patterns for this graph (e.g., a road network or human contact network). To achieve this, we propose a novel Graph-Translation-Generative Adversarial Networks (GT-GAN) which will generate a graph translator from input to target graphs. GT-GAN consists of a graph translator where we propose new graph convolution and deconvolution layers to learn the global and local translation mapping. A new conditional graph discriminator has also been proposed to classify target graphs by conditioning on input graphs. Extensive experiments on multiple synthetic and real-world datasets demonstrate the effectiveness and scalability of the proposed GT-GAN.Comment: 9 pages, 4 figures, 4 table

arXiv.org e-Print Archive

PRIMME_SVDS: A High-Performance Preconditioned SVD Solver for Accurate Large-Scale Computations

Author: Romero Eloy
Stathopoulos Andreas
Wu Lingfei
Publication venue
Publication date: 24/01/2017
Field of study

The increasing number of applications requiring the solution of large scale singular value problems have rekindled interest in iterative methods for the SVD. Some promising recent ad- vances in large scale iterative methods are still plagued by slow convergence and accuracy limitations for computing smallest singular triplets. Furthermore, their current implementations in MATLAB cannot address the required large problems. Recently, we presented a preconditioned, two-stage method to effectively and accurately compute a small number of extreme singular triplets. In this research, we present a high-performance software, PRIMME SVDS, that implements our hybrid method based on the state-of-the-art eigensolver package PRIMME for both largest and smallest singular values. PRIMME SVDS fills a gap in production level software for computing the partial SVD, especially with preconditioning. The numerical experiments demonstrate its superior performance compared to other state-of-the-art software and its good parallel performance under strong and weak scaling.Comment: 23 pages, 10 figure

arXiv.org e-Print Archive

TRPL+K: Thick-Restart Preconditioned Lanczos+K Method for Large Symmetric Eigenvalue Problems

Author: Stathopoulos Andreas
Wu Lingfei
Xue Fei
Publication venue
Publication date: 15/07/2018
Field of study

The Lanczos method is one of the standard approaches for computing a few eigenpairs of a large, sparse, symmetric matrix. It is typically used with restarting to avoid unbounded growth of memory and computational requirements. Thick-restart Lanczos is a popular restarted variant because of its simplicity and numerically robustness. However, convergence can be slow for highly clustered eigenvalues so more effective restarting techniques and the use of preconditioning is needed. In this paper, we present a thick-restart preconditioned Lanczos method, TRPL+K, that combines the power of locally optimal restarting (+K) and preconditioning techniques with the efficiency of the thick-restart Lanczos method. TRPL+K employs an inner-outer scheme where the inner loop applies Lanczos on a preconditioned operator while the outer loop augments the resulting Lanczos subspace with certain vectors from the previous restart cycle to obtain eigenvector approximations with which it thick restarts the outer subspace. We first identify the differences from various relevant methods in the literature. Then, based on an optimization perspective, we show an asymptotic global quasi-optimality of a simplified TRPL+K method compared to an unrestarted global optimal method. Finally, we present extensive experiments showing that TRPL+K either outperforms or matches other state-of-the-art eigenmethods in both matrix-vector multiplications and computational time.Comment: 27 pages, 6 figures, 7 tables. Submitted to SIAM Journal on Scientific Computing, Minor Revisio

arXiv.org e-Print Archive

Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward

Author: Huang Luyang
Wang Lu
Wu Lingfei
Publication venue
Publication date: 03/05/2020
Field of study

Sequence-to-sequence models for abstractive summarization have been studied extensively, yet the generated summaries commonly suffer from fabricated content, and are often found to be near-extractive. We argue that, to address these issues, the summarizer should acquire semantic interpretation over input, e.g., via structured representation, to allow the generation of more informative summaries. In this paper, we present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD. We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities, complementing each other. We further design a reward based on a multiple choice cloze test to drive the model to better capture entity interactions. Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets. We also obtain better or comparable performance compared to systems that are fine-tuned from large pretrained language models. Human judges further rate our model outputs as more informative and containing fewer unfaithful errors.Comment: Accepted as a long paper to ACL 202

arXiv.org e-Print Archive